DROPS

Document

DOI: 10.4230/LIPIcs.WABI.2023.11

Revisiting the Complexity of and Algorithms for the Graph Traversal Edit Distance and Its Variants

Authors: Yutong Qiu, Yihang Shen, and Carl Kingsford

Published in: LIPIcs, Volume 273, 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)

Abstract

The graph traversal edit distance (GTED), introduced by Ebrahimpour Boroojeny et al. (2018), is an elegant distance measure defined as the minimum edit distance between strings reconstructed from Eulerian trails in two edge-labeled graphs. GTED can be used to infer evolutionary relationships between species by comparing de Bruijn graphs directly without the computationally costly and error-prone process of genome assembly. Ebrahimpour Boroojeny et al. (2018) propose two ILP formulations for GTED and claim that GTED is polynomially solvable because the linear programming relaxation of one of the ILPs will always yield optimal integer solutions. The claim that GTED is polynomially solvable is contradictory to the complexity of existing string-to-graph matching problems. We resolve this conflict in complexity results by proving that GTED is NP-complete and showing that the ILPs proposed by Ebrahimpour Boroojeny et al. do not solve GTED but instead solve for a lower bound of GTED and are not solvable in polynomial time. In addition, we provide the first two, correct ILP formulations of GTED and evaluate their empirical efficiency. These results provide solid algorithmic foundations for comparing genome graphs and point to the direction of heuristics that estimate GTED efficiently.

Cite as

Yutong Qiu, Yihang Shen, and Carl Kingsford. Revisiting the Complexity of and Algorithms for the Graph Traversal Edit Distance and Its Variants. In 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 273, pp. 11:1-11:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{qiu_et_al:LIPIcs.WABI.2023.11,
  author =	{Qiu, Yutong and Shen, Yihang and Kingsford, Carl},
  title =	{{Revisiting the Complexity of and Algorithms for the Graph Traversal Edit Distance and Its Variants}},
  booktitle =	{23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)},
  pages =	{11:1--11:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-294-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{273},
  editor =	{Belazzougui, Djamal and Ouangraoua, A\"{i}da},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2023.11},
  URN =		{urn:nbn:de:0030-drops-186374},
  doi =		{10.4230/LIPIcs.WABI.2023.11},
  annote =	{Keywords: Integer Linear Programming, Genome Graphs, Flow Network, Graph Comparison}
}

Document

DOI: 10.4230/LIPIcs.WABI.2023.23

Reinforcement Learning for Robotic Liquid Handler Planning

Authors: Mohsen Ferdosi, Yuejun Ge, and Carl Kingsford

Published in: LIPIcs, Volume 273, 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)

Abstract

Robotic liquid handlers play a crucial role in automating laboratory tasks such as sample preparation, high-throughput screening, and assay development. Manually designing protocols takes significant effort, and can result in inefficient protocols and involve human error. We investigates the application of reinforcement learning to automate the protocol design process resulting in reduced human labor and further automation in liquid handling. We develop a reinforcement learning agent that can automatically output the step-by-step protocol based on the initial state of the deck, reagent types and volumes, and the desired state of the reagents after the protocol is finished. We show that finding the optimal protocol for solving a liquid handler instance is NP-complete, and we present a reinforcement learning algorithm that can solve the planning problem practically for cases with a deck of up to 20 × 20 wells and four different types of reagents. We design and implement an actor-critic approach, and we train our agent using the Impala algorithm. Our findings demonstrate that reinforcement learning can be used to automatically program liquid handler robotic arms, enabling more precise and efficient planning for the liquid handler and laboratory automation.

Cite as

Mohsen Ferdosi, Yuejun Ge, and Carl Kingsford. Reinforcement Learning for Robotic Liquid Handler Planning. In 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 273, pp. 23:1-23:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{ferdosi_et_al:LIPIcs.WABI.2023.23,
  author =	{Ferdosi, Mohsen and Ge, Yuejun and Kingsford, Carl},
  title =	{{Reinforcement Learning for Robotic Liquid Handler Planning}},
  booktitle =	{23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)},
  pages =	{23:1--23:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-294-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{273},
  editor =	{Belazzougui, Djamal and Ouangraoua, A\"{i}da},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2023.23},
  URN =		{urn:nbn:de:0030-drops-186494},
  doi =		{10.4230/LIPIcs.WABI.2023.23},
  annote =	{Keywords: Liquid Handler, Reinforcement Learning, Planning}
}

Document

Complete Volume

DOI: 10.4230/LIPIcs.WABI.2020

LIPIcs, Volume 172, WABI 2020, Complete Volume

Authors: Carl Kingsford and Nadia Pisanti

Published in: LIPIcs, Volume 172, 20th International Workshop on Algorithms in Bioinformatics (WABI 2020)

Abstract

LIPIcs, Volume 172, WABI 2020, Complete Volume

Cite as

20th International Workshop on Algorithms in Bioinformatics (WABI 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 172, pp. 1-360, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@Proceedings{kingsford_et_al:LIPIcs.WABI.2020,
  title =	{{LIPIcs, Volume 172, WABI 2020, Complete Volume}},
  booktitle =	{20th International Workshop on Algorithms in Bioinformatics (WABI 2020)},
  pages =	{1--360},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-161-0},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{172},
  editor =	{Kingsford, Carl and Pisanti, Nadia},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2020},
  URN =		{urn:nbn:de:0030-drops-127881},
  doi =		{10.4230/LIPIcs.WABI.2020},
  annote =	{Keywords: LIPIcs, Volume 172, WABI 2020, Complete Volume}
}

Document

Front Matter

DOI: 10.4230/LIPIcs.WABI.2020.0

Front Matter, Table of Contents, Preface, Conference Organization

Authors: Carl Kingsford and Nadia Pisanti

Published in: LIPIcs, Volume 172, 20th International Workshop on Algorithms in Bioinformatics (WABI 2020)

Abstract

Front Matter, Table of Contents, Preface, Conference Organization

Cite as

20th International Workshop on Algorithms in Bioinformatics (WABI 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 172, pp. 0:i-0:x, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{kingsford_et_al:LIPIcs.WABI.2020.0,
  author =	{Kingsford, Carl and Pisanti, Nadia},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{20th International Workshop on Algorithms in Bioinformatics (WABI 2020)},
  pages =	{0:i--0:x},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-161-0},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{172},
  editor =	{Kingsford, Carl and Pisanti, Nadia},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2020.0},
  URN =		{urn:nbn:de:0030-drops-127891},
  doi =		{10.4230/LIPIcs.WABI.2020.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}

Document

DOI: 10.4230/LIPIcs.WABI.2020.12

Exact Transcript Quantification Over Splice Graphs

Authors: Cong Ma, Hongyu Zheng, and Carl Kingsford

Published in: LIPIcs, Volume 172, 20th International Workshop on Algorithms in Bioinformatics (WABI 2020)

Abstract

The probability of sequencing a set of RNA-seq reads can be directly modeled using the abundances of splice junctions in splice graphs instead of the abundances of a list of transcripts. We call this model graph quantification, which was first proposed by Bernard et al. (2014). The model can be viewed as a generalization of transcript expression quantification where every full path in the splice graph is a possible transcript. However, the previous graph quantification model assumes the length of single-end reads or paired-end fragments is fixed. We provide an improvement of this model to handle variable-length reads or fragments and incorporate bias correction. We prove that our model is equivalent to running a transcript quantifier with exactly the set of all compatible transcripts. The key to our method is constructing an extension of the splice graph based on Aho-Corasick automata. The proof of equivalence is based on a novel reparameterization of the read generation model of a state-of-art transcript quantification method. This new approach is useful for modeling scenarios where reference transcriptome is incomplete or not available and can be further used in transcriptome assembly or alternative splicing analysis.

Cite as

Cong Ma, Hongyu Zheng, and Carl Kingsford. Exact Transcript Quantification Over Splice Graphs. In 20th International Workshop on Algorithms in Bioinformatics (WABI 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 172, pp. 12:1-12:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{ma_et_al:LIPIcs.WABI.2020.12,
  author =	{Ma, Cong and Zheng, Hongyu and Kingsford, Carl},
  title =	{{Exact Transcript Quantification Over Splice Graphs}},
  booktitle =	{20th International Workshop on Algorithms in Bioinformatics (WABI 2020)},
  pages =	{12:1--12:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-161-0},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{172},
  editor =	{Kingsford, Carl and Pisanti, Nadia},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2020.12},
  URN =		{urn:nbn:de:0030-drops-128013},
  doi =		{10.4230/LIPIcs.WABI.2020.12},
  annote =	{Keywords: RNA-seq, alternative splicing, transcript quantification, splice graph, network flow}
}

Document

DOI: 10.4230/LIPIcs.WABI.2019.15

Context-Aware Seeds for Read Mapping

Authors: Hongyi Xin, Mingfu Shao, and Carl Kingsford

Published in: LIPIcs, Volume 143, 19th International Workshop on Algorithms in Bioinformatics (WABI 2019)

Abstract

Motivation: Most modern seed-and-extend NGS read mappers employ a seeding scheme that requires extracting t non-overlapping seeds in each read in order to find all valid mappings under an edit distance threshold of t. As t grows (such as in long reads with high error rate), this seeding scheme forces mappers to use more and shorter seeds, which increases the seed hits (seed frequencies) and therefore reduces the efficiency of mappers. Results: We propose a novel seeding framework, context-aware seeds (CAS). CAS guarantees finding all valid mapping but uses fewer (and longer) seeds, which reduces seed frequencies and increases efficiency of mappers. CAS achieves this improvement by attaching a confidence radius to each seed in the reference. We prove that all valid mappings can be found if the sum of confidence radii of seeds are greater than t. CAS generalizes the existing pigeonhole-principle-based seeding scheme in which this confidence radius is implicitly always 1. Moreover, we design an efficient algorithm that constructs the confidence radius database in linear time. We experiment CAS with E. coli genome and show that CAS reduces seed frequencies by up to 20.3% when compared with the state-of-the-art pigeonhole-principle-based seeding algorithm, the Optimal Seed Solver. Availability: https://github.com/Kingsford-Group/CAS_code

Cite as

Hongyi Xin, Mingfu Shao, and Carl Kingsford. Context-Aware Seeds for Read Mapping. In 19th International Workshop on Algorithms in Bioinformatics (WABI 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 143, pp. 15:1-15:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{xin_et_al:LIPIcs.WABI.2019.15,
  author =	{Xin, Hongyi and Shao, Mingfu and Kingsford, Carl},
  title =	{{Context-Aware Seeds for Read Mapping}},
  booktitle =	{19th International Workshop on Algorithms in Bioinformatics (WABI 2019)},
  pages =	{15:1--15:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-123-8},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{143},
  editor =	{Huber, Katharina T. and Gusfield, Dan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2019.15},
  URN =		{urn:nbn:de:0030-drops-110452},
  doi =		{10.4230/LIPIcs.WABI.2019.15},
  annote =	{Keywords: Read Mapping, Seed and Extend, Edit Distance, Suffix Trie}
}

Document

DOI: 10.4230/LIPIcs.WABI.2019.18

Detecting Transcriptomic Structural Variants in Heterogeneous Contexts via the Multiple Compatible Arrangements Problem

Authors: Yutong Qiu, Cong Ma, Han Xie, and Carl Kingsford

Published in: LIPIcs, Volume 143, 19th International Workshop on Algorithms in Bioinformatics (WABI 2019)

Abstract

Transcriptomic structural variants (TSVs) - large-scale transcriptome sequence change due to structural variation - are common, especially in cancer. Detecting TSVs is a challenging computational problem. Sample heterogeneity (including differences between alleles in diploid organisms) is a critical confounding factor when identifying TSVs. To improve TSV detection in heterogeneous RNA-seq samples, we introduce the Multiple Compatible Arrangement Problem (MCAP), which seeks k genome rearrangements to maximize the number of reads that are concordant with at least one rearrangement. This directly models the situation of a heterogeneous or diploid sample. We prove that MCAP is NP-hard and provide a 1/4-approximation algorithm for k=1 and a 3/4-approximation algorithm for the diploid case (k=2) assuming an oracle for k=1. Combining these, we obtain a 3/16-approximation algorithm for MCAP when k=2 (without an oracle). We also present an integer linear programming formulation for general k. We characterize the graph structures that require k>1 to satisfy all edges and show such structures are prevalent in cancer samples. We evaluate our algorithms on 381 TCGA samples and 2 cancer cell lines and show improved performance compared to the state-of-the-art TSV-calling tool, SQUID.

Cite as

Yutong Qiu, Cong Ma, Han Xie, and Carl Kingsford. Detecting Transcriptomic Structural Variants in Heterogeneous Contexts via the Multiple Compatible Arrangements Problem. In 19th International Workshop on Algorithms in Bioinformatics (WABI 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 143, pp. 18:1-18:5, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{qiu_et_al:LIPIcs.WABI.2019.18,
  author =	{Qiu, Yutong and Ma, Cong and Xie, Han and Kingsford, Carl},
  title =	{{Detecting Transcriptomic Structural Variants in Heterogeneous Contexts via the Multiple Compatible Arrangements Problem}},
  booktitle =	{19th International Workshop on Algorithms in Bioinformatics (WABI 2019)},
  pages =	{18:1--18:5},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-123-8},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{143},
  editor =	{Huber, Katharina T. and Gusfield, Dan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2019.18},
  URN =		{urn:nbn:de:0030-drops-110483},
  doi =		{10.4230/LIPIcs.WABI.2019.18},
  annote =	{Keywords: transcriptomic structural variation, integer linear programming, heterogeneity}
}

Document

DOI: 10.4230/LIPIcs.WABI.2019.23

Topological Data Analysis Reveals Principles of Chromosome Structure in Cellular Differentiation

Authors: Natalie Sauerwald, Yihang Shen, and Carl Kingsford

Published in: LIPIcs, Volume 143, 19th International Workshop on Algorithms in Bioinformatics (WABI 2019)

Abstract

Topological data analysis (TDA) is a mathematically well-founded set of methods to derive robust information about the structure and topology of data. It has been applied successfully in several biological contexts. Derived primarily from algebraic topology, TDA rigorously identifies persistent features in complex data, making it well-suited to better understand the key features of three-dimensional chromosome structure. Chromosome structure has a significant influence in many diverse genomic processes and has recently been shown to relate to cellular differentiation. While there exist many methods to study specific substructures of chromosomes, we are still missing a global view of all geometric features of chromosomes. By applying TDA to the study of chromosome structure through differentiation across three cell lines, we provide insight into principles of chromosome folding and looping. We identify persistent connected components and one-dimensional topological features of chromosomes and characterize them across cell types and stages of differentiation. Availability: Scripts to reproduce the results from this study can be found at https://github.com/Kingsford-Group/hictda

Cite as

Natalie Sauerwald, Yihang Shen, and Carl Kingsford. Topological Data Analysis Reveals Principles of Chromosome Structure in Cellular Differentiation. In 19th International Workshop on Algorithms in Bioinformatics (WABI 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 143, pp. 23:1-23:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{sauerwald_et_al:LIPIcs.WABI.2019.23,
  author =	{Sauerwald, Natalie and Shen, Yihang and Kingsford, Carl},
  title =	{{Topological Data Analysis Reveals Principles of Chromosome Structure in Cellular Differentiation}},
  booktitle =	{19th International Workshop on Algorithms in Bioinformatics (WABI 2019)},
  pages =	{23:1--23:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-123-8},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{143},
  editor =	{Huber, Katharina T. and Gusfield, Dan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2019.23},
  URN =		{urn:nbn:de:0030-drops-110537},
  doi =		{10.4230/LIPIcs.WABI.2019.23},
  annote =	{Keywords: topological data analysis, chromosome structure, Hi-C, topologically associating domains}
}

Search Results

Documents authored by Kingsford, Carl

Revisiting the Complexity of and Algorithms for the Graph Traversal Edit Distance and Its Variants

Abstract

Cite as

Reinforcement Learning for Robotic Liquid Handler Planning

Abstract

Cite as

LIPIcs, Volume 172, WABI 2020, Complete Volume

Abstract

Cite as

Front Matter, Table of Contents, Preface, Conference Organization

Abstract

Cite as

Exact Transcript Quantification Over Splice Graphs

Abstract

Cite as

Context-Aware Seeds for Read Mapping

Abstract

Cite as

Detecting Transcriptomic Structural Variants in Heterogeneous Contexts via the Multiple Compatible Arrangements Problem

Abstract

Cite as

Topological Data Analysis Reveals Principles of Chromosome Structure in Cellular Differentiation

Abstract

Cite as

Thanks for your feedback!

Could not send message